Simple Queries as Distant Labels for Predicting Gender on Twitter
نویسندگان
چکیده
The majority of research on extracting missing user attributes from social media profiles use costly hand-annotated labels for supervised learning. Distantly supervised methods exist, although these generally rely on knowledge gathered using external sources. This paper demonstrates the effectiveness of gathering distant labels for self-reported gender on Twitter using simple queries. We confirm the reliability of this query heuristic by comparing with manual annotation. Moreover, using these labels for distant supervision, we demonstrate competitive model performance on the same data as models trained on manual annotations. As such, we offer a cheap, extensible, and fast alternative that can be employed beyond the task of gender classification.
منابع مشابه
Sentiment Analysis for Social Media
The proposed system is able to collect useful information from the twitter website and efficiently perform sentiment analysis of tweets regarding the Smart phone war. The system uses efficient scoring system for predicting the user’s age. The user ‘gender is predicted using a well trained Naïve Bayes Classifier. Sentiment Classifier Model labels the tweet with a sentiment. This helps in compreh...
متن کاملWeakly Supervised User Profile Extraction from Twitter
While user attribute extraction on social media has received considerable attention, existing approaches, mostly supervised, encounter great difficulty in obtaining gold standard data and are therefore limited to predicting unary predicates (e.g., gender). In this paper, we present a weaklysupervised approach to user profile extraction from Twitter. Users’ profiles from social media websites su...
متن کاملPredicting Twitter User Demographics using Distant Supervision from Website Traffic Data
Understanding the demographics of users of online social networks has important applications for health, marketing, and public messaging. Whereas most prior approaches rely on a supervised learning approach, in which individual users are labeled with demographics for training, we instead create a distantly labeled dataset by collecting audience measurement data for 1,500 websites (e.g., 50% of ...
متن کاملWhat's in a Name? Using First Names as Features for Gender Inference in Twitter
Despite significant work on the problem of inferring a Twitter user’s gender from her online content, no systematic investigation has been made into leveraging the most obvious signal of a user’s gender: first name. In this paper, we perform a thorough investigation of the link between gender and first name in English tweets. Our work makes several important contributions. The first and most ce...
متن کاملOverview of the RUSProfiling PAN at FIRE Track on Cross-genre Gender Identification in Russian
Author profiling consists of predicting some author’s traits (e.g. age, gender, personality) from her writing. After addressing at PAN@CLEF mainly age and gender identification, in this RusProfiling PAN@FIRE track we have addressed the problem of predicting author’s gender in Russian from a cross-genre perspective: given a training set on Twitter, the systems have been evaluated on five differe...
متن کامل